Parks and recreation was a television comedy show that aired on NBC from 2009 until 2015. I obtained the complete transcripts and performed text analysis on the dialogue of the show.
Citation for dataset: He, Luke. (2019, November 23) Park and Recreation Scripts. Link to data.
file_names <- list.files(here("scripts")) # file names for each episode
parks <- str_glue("scripts/{file_names}") %>%
map_dfr(read_csv) # read in all the episodes into one data frame!
# Tokenize lines to one word in each row
parks_token <- parks %>%
clean_names() %>%
unnest_tokens(word, line) %>% # tokenize
anti_join(stop_words) %>% # remove stop words
mutate(word = str_extract(word, "[a-z']+")) %>% # extract words only
drop_na(word) # take out missing values
# Filter the top 10 characters with the most words
top_characters <- parks_token %>%
dplyr::filter(character != "Extra") %>%
count(character, sort = TRUE) %>%
slice_max(n, n = 10)
# Obtain words only from the top 10 characters
parks_words <- parks_token %>%
inner_join(top_characters) %>%
filter(!word %in% c("hey", "yeah", "gonna")) %>%
select(-n) %>%
count(word, character, sort = TRUE) %>%
ungroup() %>%
group_by(character) %>%
top_n(9) # top 9
# Sample of a few lines from the show
parks %>%
slice(sample(1:65942, 20)) %>%
kbl(caption = "<b style = 'color:white;'>
Sample of a few randomly chosen lines from Parks and Recreation.") %>%
kable_material_dark(bootstrap_options = c("striped", "hover")) %>%
row_spec(0, color = "white", background = "#222222") %>%
scroll_box(width = "100%", height = "300px",
fixed_thead = list(enabled = T, background = "#222222"))
| Character | Line |
|---|---|
| Tom Haverford | Man, I really got things figured out. |
| Jerry Gergich | As you may know, I do like to tinker with things in my garage. |
| April Ludgate | You’re unpredictable, complex, and hard to read. |
| Extra | You’ve been served. |
| Joe | You’re welcome. |
| Tom Haverford | " Everything you just said makes me like me more. |
| Tom Haverford | 600 |
| Leslie Knope | So, you don’t have any idea what you want to do? |
| Ron Swanson | I’d love for you to stick around, Tommy. |
| Leslie Knope | And then it gets a little unpleasant. |
| Leslie Knope | Oh! |
| Leslie Knope | The best way to be safe is to simply postpone sex until marriage. |
| Joan Callamezzo | Looking at you, Leslie. |
| Music | Yes, I pray that you do love me too |
| Ian Winston | Leslie said I should come over here. |
| Leslie Knope | But I’m just gonna spend the remainder of my term cramming in as many good projects as I can. |
| Tom Haverford | The four sweetest words in the English language, “You wore me down.” |
| Leslie Knope | His mom’s gonna be here soon. |
| Tom Haverford | Oh, no. |
| Ann Perkins | But in the meantime, I’m just gonna tell you in English. |
It’s difficult to choose a favorite character from Parks and Rec, thus I plotted the top 9 most frequently used words from ten characters. Some examples of words that would resonate with fans of the show are Chris Traeger’s literally, Jerry (Gary) Gergich’s geez, or Ben Wyatt’s uh.
ggplot(data = parks_words,
aes(x = n, y = word, fill = n)) +
geom_col(show.legend = FALSE) +
scale_fill_viridis_c(option = "plasma") +
facet_wrap(~character, scales = "free") +
theme_brooklyn99() +
theme(panel.grid.major.y = element_blank(),
axis.text.x = element_text(size = 9),
axis.text.y = element_text(size = 9.5),
axis.title = element_blank(),
panel.grid.minor = element_blank(),
strip.text = element_text(color = "white",
face = "bold",
size = 10.5))
Below are four wordclouds of the 25 most frequently used words by the following characters starting from the upper left hand corner going clockwise: Andy Dwyer, April Ludgate, Ron Swanson, and Leslie Knope. We can see Andy Dwyer’s enthusiasm with karate and band, Leslie Knope’s love for pawnee, city, and parks, but also Ron Swanson’s contempt for government and his 2 ex-wives both named tammy.
# Ron Swanson
swanson_words <- parks_token %>%
filter(character == "Ron Swanson") %>% # filter for character
filter(!word %in% c("hey", "yeah", "gonna")) %>% # remove some more stopwords
count(word) %>%
slice_max(n,n = 25) # choose top 25 words
swanson_pic <- jpeg::readJPEG(here("images","ron_swanson.jpg"))
swanson_cloud <- ggplot(data = swanson_words,
aes(label = word)) +
background_image(swanson_pic) + # add image of character
geom_text_wordcloud(aes(size = n),
color = "turquoise1",
shape = "circle") +
scale_size_area(max_size = 6) +
theme_void()
# Lesile Knope
knope_words <- parks_token %>%
filter(character == "Leslie Knope") %>%
filter(!word %in% c("hey", "yeah", "gonna")) %>% # remove some more stopwords
count(word) %>%
slice_max(n,n = 25)
knope_pic <- jpeg::readJPEG(here("images", "knope.jpg"))
knope_cloud <- ggplot(data = knope_words,
aes(label = word)) +
background_image(knope_pic) +
geom_text_wordcloud(aes(size = n),
color = "turquoise1",
shape = "star") +
scale_size_area(max_size = 6) +
theme_void()
# April Ludgate
april_words <- parks_token %>%
filter(character == "April Ludgate") %>%
filter(!word %in% c("hey", "yeah", "gonna")) %>% # remove some more stopwords
count(word) %>%
slice_max(n,n = 25)
april_pic <- jpeg::readJPEG(here("images", "april.jpeg"))
april_cloud <- ggplot(data = april_words,
aes(label = word)) +
background_image(april_pic) +
geom_text_wordcloud(aes(size = n),
color = "turquoise1",
shape = "triangle-upright") +
scale_size_area(max_size = 6) +
theme_void()
# Andy Dwyer
andy_words <- parks_token %>%
filter(character == "Andy Dwyer") %>%
filter(!word %in% c("hey", "yeah", "gonna")) %>% # remove some more stopwords
count(word) %>%
slice_max(n,n = 25)
andy_pic <- jpeg::readJPEG(here("images", "andy.jpg"))
andy_cloud <- ggplot(data = andy_words,
aes(label = word)) +
background_image(andy_pic) +
geom_text_wordcloud(aes(size = n),
color = "turquoise1",
shape = "diamond") +
scale_size_area(max_size = 6) +
theme_void()
# Final patcwork wordcloud
patchwork <- (andy_cloud + april_cloud) / (knope_cloud + swanson_cloud)
patchwork & theme(plot.background = element_rect(fill = "#222222",
color = "#222222"),
strip.background = element_rect(fill = "#222222",
color = "#222222"))
Using the nrc lexicon, which bins 13,901 words into 8 emotions, along with giving them a positive or negative rating, I plotted the counts of each sentiment for ten characters. We see that all the characters shown here use more positive words, and they all used words associated with trust and anticipation.
Citation for NRC lexicon: Crowdsourcing a Word-Emotion Association Lexicon, Saif Mohammad and Peter Turney, Computational Intelligence, 29 (3), 436-465, 2013. nrc lexicon
characters_sent <- parks_token %>%
inner_join(top_characters) %>%
filter(!word %in% c("hey", "yeah", "gonna")) %>%
select(-n) %>%
inner_join(get_sentiments("nrc")) %>%
count(sentiment, character, sort = TRUE)
ggplot(data = characters_sent,
aes(x = n, y = sentiment, fill = n)) +
geom_col(show.legend = FALSE) +
scale_fill_viridis_c(option = "plasma") +
facet_wrap(~character, scales = "free") +
theme_brooklyn99() +
theme(panel.grid.major.y = element_blank(),
axis.text.x = element_text(size = 7.5),
axis.text.y = element_text(size = 9.5),
axis.title = element_blank(),
panel.grid.minor = element_blank(),
strip.text = element_text(color = "white",
face = "bold",
size = 9.3))
Parks and Recreation is a hilarious comedy show with many enjoyable characters. Thus, it’s no surprise that for most of the show the average sentiment is more positive. Using the AFINN lexicon, which assigns words a score between -5 (negative sentiment) and 5 (positive sentiment), I obtained the moving average with a window size of 151, and plotted the moving average sentiment throughout the entirety of the show.
Citation for AFINN lexicon: AFINN, Nielson, Finn Årup. Informatics and Mathematical Modelling, Technical University of Denmark. March 2011. AFINN lexicon
parks_afinn <- parks_token %>%
inner_join(get_sentiments("afinn")) %>%
drop_na(value) %>%
mutate(index = seq(1, length(word) ,1)) %>% # make an index
mutate(moving_avg = as.numeric(slide(value, # get moving average
mean,
.before = (151 - 1)/2 ,
.after = (151 - 1)/2 ))) %>%
mutate(neg_pos = factor(case_when(
moving_avg > 0 ~ "Positive",
moving_avg <= 0 ~ "Negative"
)))
sent_plot <- ggplot(data = parks_afinn, aes(x = index, y = moving_avg)) +
geom_col(aes(fill = neg_pos)) +
scale_fill_manual(values = c("Positive" = "springgreen2",
"Negative" = "darkred"))+
theme_minimal() +
labs(x = "Index",
y = "Moving Average AFINN Sentiment",
fill = "") +
theme(panel.grid.minor.y = element_blank(),
panel.grid.major.x = element_blank(),
axis.text.x = element_blank(),
axis.text.y = element_text(size = 11,
face = "bold",
color = "white"),
axis.title.y = element_text(color = "white",
size = 12,
face = "bold"),
axis.title.x = element_blank(),
panel.grid.minor = element_blank(),
plot.background = element_rect(fill = "#222222",
color = "#222222"),
strip.background = element_rect(fill = "#222222",
color = "#222222"),
legend.text = element_text(color = "white",
size = 11,
face = "bold"))
sent_plot
I decided to take a closer look at the sentiment throughout season 4 since this was one of the more popular seasons, where Leslie Knope is campaigning to be a member of the city council of Pawnee, Indiana. Here I used a moving average window of 51 to plot the AFINN sentiment value. We see that for most of the season the overall average sentiment is positive, except for a noticeable drop near the end of the season where the sentiment score falls around -1.
file_names_season <- str_sub(file_names, start = 3L)
# used this line of code to easily find the episode number of each season
# which(file_names_season == "e01.csv")
season_4 <- str_glue("scripts/{file_names[47:68]}") %>%
map_dfr(read_csv)
# Tokenize lines to one word in each row
season_token <- season_4 %>%
clean_names() %>%
unnest_tokens(word, line) %>% # tokenize
anti_join(stop_words) %>% # remove stop words
mutate(word = str_extract(word, "[a-z']+")) %>% # extract words only
drop_na(word) # take out missing values
season_afinn <- season_token %>%
inner_join(get_sentiments("afinn")) %>%
drop_na(value) %>%
mutate(index = seq(1, length(word) ,1)) %>%
mutate(moving_avg = as.numeric(slide(value,
mean,
.before = (51 - 1)/2 ,
.after = (51 - 1)/2 )))
season_plot <- ggplot(data = season_afinn, aes(x = index, y = moving_avg)) +
geom_col(aes(fill = moving_avg)) +
# scale_fill_distiller(type = "div",
# palette = "GnPR")+
scale_fill_carto_c(type = "diverging",
palette = "Earth") +
theme_minimal() +
labs(x = "Index",
y = "Moving Average AFINN Sentiment",
fill = "") +
theme(panel.grid.minor.y = element_blank(),
panel.grid.major.x = element_blank(),
axis.text.x = element_blank(),
axis.text.y = element_text(size = 11,
face = "bold",
color = "white"),
axis.title.y = element_text(color = "white",
size = 12,
face = "bold"),
axis.title.x = element_blank(),
panel.grid.minor = element_blank(),
plot.background = element_rect(fill = "#222222",
color = "#222222"),
strip.background = element_rect(fill = "#222222",
color = "#222222"),
legend.text = element_text(color = "white",
size = 11,
face = "bold"))
season_plot
Digging into the data I found that this occurred during the penultimate episode of the season named “Bus Tour”. The episode starts with Lesile Knope behind in polls to her opponent in the city council race, Bobby Newport. During one of her campaign stops, in response to a question by a reporter, Lesile starts saying disparaging things about Bobby’s father. After she is finished, the reporter informs Leslie her question was about if she had any comments about his death earlier in the day. Meanwhile, in order to get people to the polls, Lesile’s team trys to secure vans to transport possible voters. But Bobby Newport’s team has secured all the vans in the city. Thus, most of the episode is spent trying to do damage control for Lesile and her campaign team’s mishaps. Below are the words that have AFINN ratings during this dip in sentiment in season 4.
# Investigate the negative dip of the plot
season_afinn_neg <- season_afinn %>%
filter(moving_avg < -0.75) %>%
slice(-c(1:2)) %>%
select(-index) %>%
rename('moving average' = moving_avg)
# How I figured out which episode it was
season_4_subset <- season_4 %>%
filter(Character == "Bill")
# Table of words
season_afinn_neg %>%
kbl(caption = "<b style = 'color:white;'>
What was happening towards the end of season 4 of Park and Recreation when things went south?") %>%
kable_material_dark(bootstrap_options = c("striped", "hover")) %>%
row_spec(0, color = "white", background = "#222222") %>%
scroll_box(width = "100%", height = "300px",
fixed_thead = list(enabled = T, background = "#222222"))
| character | word | value | moving average |
|---|---|---|---|
| Bill | grand | 3 | -0.7647059 |
| Tom Haverford | demands | -1 | -0.7647059 |
| Tom Haverford | crying | -2 | -0.8431373 |
| Leslie Knope | promise | 1 | -0.8235294 |
| Leslie Knope | stop | -1 | -0.8823529 |
| Leslie Knope | intimidating | -2 | -0.8823529 |
| Leslie Knope | bullying | -2 | -0.9019608 |
| Leslie Knope | jerk | -3 | -0.9019608 |
| Leslie Knope | wrong | -2 | -0.9019608 |
| Leslie Knope | died | -3 | -0.8627451 |
| Leslie Knope | sad | -2 | -0.9215686 |
| Extra | sad | -2 | -0.9215686 |
| Leslie Knope | bummer | -2 | -0.8039216 |
| Leslie Knope | jerk | -3 | -0.7647059 |
| Perd Hapley | love | 3 | -0.7647059 |
| Jennifer Barkley | cancel | -1 | -0.7843137 |
| Leslie Knope | emergency | -2 | -0.8235294 |
| Leslie Knope | trust | 1 | -0.8431373 |
| Leslie Knope | died | -3 | -0.9019608 |
| Leslie Knope | awful | -3 | -0.8823529 |
| Leslie Knope | died | -3 | -0.7843137 |
| Ann Perkins | dead | -3 | -0.7843137 |
| Ann Perkins | jerk | -3 | -0.8823529 |
| Leslie Knope | jerk | -3 | -0.9411765 |
| Leslie Knope | polluted | -2 | -0.9803922 |
| Ben Wyatt | stop | -1 | -1.0980392 |
| Ben Wyatt | stop | -1 | -1.1372549 |
| Ann Perkins | fine | 2 | -1.1568627 |
| Ann Perkins | stop | -1 | -1.1960784 |
| Ann Perkins | apologize | -1 | -1.1960784 |
| Chris Traeger | worst | -3 | -1.1764706 |
| Chris Traeger | stop | -1 | -1.0980392 |
| Chris Traeger | stops | -1 | -1.0588235 |
| Chris Traeger | stop | -1 | -1.0392157 |
| Chris Traeger | stopping | -1 | -0.9607843 |
| Chris Traeger | death | -2 | -0.8627451 |
| Leslie Knope | beautiful | 3 | -0.8431373 |
| Leslie Knope | classy | 3 | -0.8235294 |
| Donna Meagle | free | 1 | -0.8235294 |
| Donna Meagle | huge | 1 | -0.7647059 |
| Bill | yeah | 1 | -0.8627451 |
| Bill | hell | -4 | -0.8039216 |
| Bill | free | 1 | -0.8039216 |
| Bill | pay | -1 | -0.7843137 |